Keeping up with the pathogens: improved antimicrobial resistance detection and prediction from Pseudomonas aeruginosa genomes

Background Antimicrobial resistance (AMR) is an intensifying threat that requires urgent mitigation to avoid a post-antibiotic era. Pseudomonas aeruginosa represents one of the greatest AMR concerns due to increasing multi- and pan-drug resistance rates. Shotgun sequencing is gaining traction for in silico AMR profiling due to its unambiguity and transferability; however, accurate and comprehensive AMR prediction from P. aeruginosa genomes remains an unsolved problem. Methods We first curated the most comprehensive database yet of known P. aeruginosa AMR variants. Next, we performed comparative genomics and microbial genome-wide association study analysis across a Global isolate Dataset (n = 1877) with paired antimicrobial phenotype and genomic data to identify novel AMR variants. Finally, the performance of our P. aeruginosa AMR database, implemented in our AMR detection and prediction tool, ARDaP, was compared with three previously published in silico AMR gene detection or phenotype prediction tools—abritAMR, AMRFinderPlus, ResFinder—across both the Global Dataset and an analysis-naïve Validation Dataset (n = 102). Results Our AMR database comprises 3639 mobile AMR genes and 728 chromosomal variants, including 75 previously unreported chromosomal AMR variants, 10 variants associated with unusual antimicrobial susceptibility, and 281 chromosomal variants that we show are unlikely to confer AMR. Our pipeline achieved a genotype-phenotype balanced accuracy (bACC) of 85% and 81% across 10 clinically relevant antibiotics when tested against the Global and Validation Datasets, respectively, vs. just 56% and 54% with abritAMR, 58% and 54% with AMRFinderPlus, and 60% and 53% with ResFinder. ARDaP’s superior performance was predominantly due to the inclusion of chromosomal AMR variants, which are generally not identified with most AMR identification tools. Conclusions Our ARDaP software and associated AMR variant database provides an accurate tool for predicting AMR phenotypes in P. aeruginosa, far surpassing the performance of current tools. Implementation of ARDaP for routine AMR prediction from P. aeruginosa genomes and metagenomes will improve AMR identification, addressing a critical facet in combatting this treatment-refractory pathogen. However, knowledge gaps remain in our understanding of the P. aeruginosa resistome, particularly the basis of colistin AMR. Supplementary Information The online version contains supplementary material available at 10.1186/s13073-024-01346-z.


Background
Antibiotic overuse and misuse [1] has driven the emergence of antimicrobial-resistant (AMR) pathogens globally [2].We are now on the verge of a 'post-antibiotic era' , where simple infections threaten to be untreatable with antimicrobials that once revolutionised modern medicine [3].If unmitigated, AMR infections are predicted to cause 10 million deaths globally by 2050 and cost USD$100 trillion per annum [4].
The ESKAPE pathogen, Pseudomonas aeruginosa, represents one of the biggest AMR threats due to its intrinsic resistance towards many antibiotics, environmental ubiquity, ability to infect a wide spectrum of hosts, and high global mortality rate [5][6][7].Accurately detecting and predicting AMR phenotype from genotype in P. aeruginosa has proven challenging [8], even using machine learning [9], with some approaches as accurate as a coin flip [8].A major shortcoming of current in silico AMR tools is that they largely focus on detecting AMR gene gain [10,11] and a small number of chromosomally encoded single-nucleotide polymorphisms (SNPs) [12][13][14].However, P. aeruginosa can also evolve AMR through chromosomal insertions-deletions (indels), loss-of-function mutations (e.g.large deletions or frameshift mutations), structural variants, and copy-number variations (CNVs) [15].Despite recent advances [11,12], most AMR tools remain limited in their scope and accuracy [16]-for example, loss-of-function mutations, a major contributor to AMR, are largely ignored [14], AMR databases are often not species-specific [14,17], they do not resolve to the individual antibiotic level [11], and precursor mutations conferring reduced antimicrobial susceptibility are overlooked.These limitations are especially problematic for accurate AMR detection and prediction in pathogens encoding complex resistomes like P. aeruginosa [8].
To address this gap, we curated and validated the most comprehensive P. aeruginosa-specific AMR variant database yet, which, when used in conjunction with the Antimicrobial Resistance Detection and Prediction (ARDaP) software [14], enables high-accuracy AMR prediction from P. aeruginosa genomes.Performance of our ARDaP-compatible AMR variant database was first assessed across 1877 diverse P. aeruginosa strains ('Global Dataset') and, subsequently, across 102 analysis-naïve P. aeruginosa strains ('Validation Dataset').Our approach, which we demonstrate far exceeds the performance to current AMR prediction software, provides a crucial steppingstone towards the routine clinical use of genomics and metagenomics to inform personalised P. aeruginosa antimicrobial treatment regimens.

Microbial genome-wide association study (mGWAS) and machine learning for AMR prediction
To identify novel AMR variants, mGWAS [104] was performed on the Global P. aeruginosa Dataset (n = 1877 strains), with SNPs and indels identified using SPANDx v4.0.1 [105].To increase the signal-to-noise ratio, variants found in antimicrobial-sensitive isolates were penalised four-fold compared with AMR strains due to a presumed large effect size [106].The top 50 variants associated with each AMR phenotype were assessed for their ability to improve phenotype prediction; those that improved phenotype prediction were included in the AMR database.Additionally, a supervised machine learning approach was performed using the Global Dataset and the AMR database as features for model creation.

Comparative genomic analysis
To identify additional novel AMR variants, we conducted a comparative genomic analysis using SPANDx, with a focus on AMR strains that did not encode a known AMR variant (i.e.false negatives).These strains were compared to their closest antimicrobial-sensitive relative(s) as determined by the whole-genome phylogenetic analysis (Fig. S1).SNPs and indels that separated AMR from antimicrobial-sensitive strain/s were identified, annotated, and manually investigated to prioritise mutations in known AMR genes.Candidate variants were then tested against the Global Dataset to determine whether they improved phenotype prediction.AMR variants that increased balanced accuracy (bACC) were included in the database; those that did not alter, or that decreased bACC, were discarded.

AMR prediction analysis
AMR prediction was performed using our P. aeruginosa AMR database (version 1.0), implemented in ARDaP [14].
ARDaP was chosen as it is the only AMR software that can detect all mutation types (i.e.SNPs, indels, gene gain, gene loss, frameshift mutations, structural variants, and CNVs) [14].ARDaP also has a built-in feature that automatically generates a clinician-friendly antimicrobial susceptibility summary report for each strain (Fig. S3) to simplify in silico AMR interpretation [14].ARDaP performance was compared against four tools for AMR phenotype prediction and/or AMR variant identification: abritAMR [107], RGI v5.1.0and CARD v3.0.9 [11], ResFinder v4.1 [10], and AMRFinderPlus v3.8.28 [108].As abritAMR and AMRFinderPlus frequently report predicted AMR phenotypes to the antibiotic class level only, we chose to interpret AMR variant presence for a given class as conferring AMR towards all antibiotics within that class.Importantly, AMRFinderPlus is not intended for clinical phenotype prediction [109] but has been included as a benchmark for gene detection accuracy.For the purposes of software comparisons, gene identification by AMRFinderPlus was interpreted as conferring phenotypic AMR.
A variant scoring scheme has previously been described by Cortes-Lara and colleagues, which employed a 0 (no effect) through 1 (EUCAST AMR) scale to predict in silico AMR profiles [98].We expanded upon this scheme by providing an automated weighted score for all AMR variants in our database that quantifies their contribution, positive or negative, towards AMR development (Dataset 1, 'Threshold' column); this score is recorded for each antibiotic on ARDaP's automatically generated clinician-friendly report (Fig. S3), unlike the Cortes-Lara scheme, which requires manual scoring for each antibiotic and strain [98].Using our scoring system, variants known to cause AMR in isolation score as 100%, whereas AMR variants known to confer AMR in a stepwise manner (that is, only when in combination with other variant/s), or that only result in intermediate resistance, are given a lower score (e.g.40-50%).This method accounts for both the additive nature of chromosomal mutations in P. aeruginosa and for the decreased AMR potential caused by loss of efflux pumps or essential transcriptional regulators.For the purposes of phenotype prediction, acquired AMR genes identified by ResFinder within ARDaP were considered to confer full AMR, i.e. given a threshold score of 100.

Intermediate resistance prediction
The capacity of our AMR variant database to predict intermediate resistance phenotypes was examined against the Global Dataset using the following criteria: • A true-positive prediction occurred when either (i) an AMR strain was classed as AMR or (ii) an intermediate strain was identified as intermediate; • A true-negative prediction occurred when an antimicrobial-sensitive strain was identified as antimicrobial sensitive; • A false-positive prediction occurred when either (i) an antimicrobial-sensitive strain was classed as intermediate or AMR or (ii) an intermediate strain was classed as AMR; and • A false-negative prediction occurred when either (i) an intermediate strain was classed as antimicrobialsensitive or when an AMR strain was predicted to be antimicrobial-sensitive or intermediate.
These rules provided the strictest evaluation criteria for the assessment of ARDaP's ability to identify intermediate resistance strains.Only ARDaP's intermediate resistance prediction performance was assessed as abri-tAMR, AMRFinderPlus, CARD, and ResFinder all lack the capacity to identify intermediate resistance.

Pseudomonas-derived cephalosporinase (bla PDC ) genotypes
Amino acid variants of the chromosomally encoded P. aeruginosa ampC β-lactamase gene are synonymously referred to as bla PDC variants in some literature [110] [36], including in CARD outputs [11].To enable genotype correlations with the bla PDC nomenclature scheme, we incorporated the 476 CARD-described bla PDC variants into ARDaP's ResFinder database.These genotypes are output by default into /Outputs/Resfinder for each isolate included in the analysis.

AMR software predictive performance in P. aeruginosa
Due to its inability to predict AMR towards individual antibiotics, and a very high rate of false-positive predictions in the Global Dataset, CARD was deemed unsuitable for P. aeruginosa AMR analysis and was thus excluded from further assessment.For all other tools, predictive performance was determined using bACC [111,112], which averages sensitivity [i.e.true positives/ (true positives + false negatives)] and specificity [i.e.true negatives/(true negatives + false positives)].This metric was chosen as it accounts for dataset imbalance, that is, it minimises over-or under-representation of antimicrobial-sensitive or AMR strains that may otherwise make certain tools appear better or worse due to inherent dataset bias [8].Additionally, we compared recall (AMR) [true positives/(true positives + false negatives], precision (AMR) or positive predictive value [true positives/ (true positives + false positives), recall (sensitivity) [true negatives/(true negatives + false positives), and precision (sensitivity) or negative predictive value [true negatives/ (true negatives + false negatives)] across all software tools.

P. aeruginosa AMR variant identification and refinement
An extensive literature search was undertaken to identify all known and putative chromosomal variants that lead to AMR in P. aeruginosa.Among 643 previously reported chromosomal AMR variants in known AMR loci (Dataset 1), 362 (56.3%) were confirmed to be associated with AMR, whereas 281 (43.7%) were re-classified as 'natural variation' (Table S3) as they were common in both antimicrobial-sensitive and AMR strains in our Global Dataset and therefore deemed unlikely to contribute to an AMR phenotype.Most of these naturally occurring variants had been previously reported as putatively causing AMR, with little to no functional investigation.Importantly, no functionally validated AMR driving variants were re-classified as 'natural variation' .The loss of a further 10 genes was associated with unusual antimicrobial sensitivity (Dataset 1).Next, using mGWAS and comparative genomic analyses of the Global Dataset, we identified 75 previously unreported AMR variants associated with one or more AMR phenotypes (Table 2).In total, our P. aeruginosa database contains 437 chromosomal AMR variants, 281 natural variants, and 10 genes associated with unusual sensitivity (Dataset 1), along with 3639 refined mobile AMR genes derived from ResFinder.
The best abritAMR, AMRFinderPlus, and ResFinder predictions were achieved for the aminoglycosides, with average bACCs of 75%, 75%, and 84%, respectively, although these rates were lower than ARDaP's average aminoglycoside bACC of 92%.abritAMR, AMRFinder-Plus, and ResFinder AMR prediction for all other antibiotics showed poor to very poor bACCs.AMRFinderPlus had a bACC of just 50% for the penicillins, cephalosporins, and colistin-the same predictive capacity as a coin flip-and had only a slightly better bACC for ciprofloxacin (58%), imipenem (58%), and meropenem (61%).Res-Finder also had a bACC of just 50% for cephalosporins and colistin, and performed worse for carbapenems (average bACC of 50%) than AMRFinderPlus, although it was better for penicillins (average bACC of 62%) and ciprofloxacin (bACC of 62%).abritAMR was the worst at predicting penicillin phenotypes, with an average bACC of just 43%, worse than a coin flip; its performance was otherwise identical to AMRFinderPlus.In contrast, ARDaP surpassed abritAMR, AMRFinderPlus, and Res-Finder across all 10 antibiotics, with bACCs ranging from 60% (colistin) to 94% (tobramycin) (Fig. 1A).
We next assessed the role of natural variants on ARDaP's AMR prediction performance.As expected, exclusion of natural variation resulted in a lower bACC for some antibiotics, most notably for the carbapenems, which dropped from 89% to 70%, and ciprofloxacin, which dropped from 93% to 56%.This loss of accuracy was caused by an increase in oprD and mexT false Abbreviations: AMK amikacin, CAZ ceftazidime, CIP ciprofloxacin, CST colistin, FEP cefepime, fs frameshift, i intermediate resistant, IPM imipenem, LOF loss of function, MEM meropenem, NA not applicable, PIP piperacillin, r resistant, TOB tobramycin, TZP piperacillin/tazobactam 1 Previously predicted to cause PIPr [36]; however, our microbial genome-wide associate study (mGWAS) analysis did not identify a significant association with this phenotype.Instead, mGWAS showed that this AMR variant was significantly associated with MEMr and IPMr 2 Previously identified variant in ampC known to reduce susceptibility to multiple cephalosporins  positives that were incorrectly predicted to confer carbapenem and ciprofloxacin AMR, respectively.In comparison to other AMR prediction/gene identification tools, the increase in ARDaP's predictive performance was predominantly due to accurate identification of chromosomal SNP and indel variants.For the carbapenems, this increase was due to the identification of loss-of-function mutations in oprD, with all tools successfully identifying other non-chromosomal variants (e.g.bla VIM ).A similar trend was also observed for the other antibiotics.For example, the increase in CIP accuracy was due to ARDaP identifying gyrA mutations, which were not identified by other tools.

Predictive performance across the Validation Dataset
We next tested abritAMR, AMRFinderPlus, ARDaP, and ResFinder across the Validation Dataset of 102 Australian clinical P. aeruginosa strains (Table S2) to determine each software's performance in an analysis-naïve dataset.As no strains in the Validation Dataset displayed colistin AMR, the bACC for this antibiotic could not be assessed.These strains otherwise exhibited similar AMR rates to the Global Dataset, ranging from 26% for meropenem to 57% for piperacillin (Table S4).

Inclusion of novel AMR variants identified in the Global Dataset
The inclusion of these markers increased Validation Dataset sensitivity by an average of 4% (range 0 to 27%) depending on antibiotic, with the sensitivity of most antibiotics (meropenem, imipenem, ciprofloxacin, cefepime, and piperacillin/tazobactam) remaining unchanged.Amikacin increased the most (27%) due to the inclusion of a SNP in rplB (Gly138Ser), followed by tobramycin at 4%, and piperacillin and ceftazidime at 3% each.

ARDaP performance between the Global and Validation Datasets
Whilst ARDaP bACC between the datasets were broadly similar, there was a greater proportion of false-positive and false-negative variants encoding AMR towards piperacillin (32% difference), tobramycin (24% difference), cefepime (19% difference), amikacin (12% difference), and meropenem (9% difference) in the Validation Dataset.In contrast, there was a greater proportion of false-positive and false-negative variants encoding amikacin AMR (5% difference) in the Global Dataset (Fig. 1).
Comparative genomic analysis of Validation Dataset isolates that yielded false-negative aminoglycoside AMR predictions identified that many belonged to a single multilocus sequence type (ST), ST801, also known as AUST-06.Among 23/24 aminoglycoside-AMR ST801 isolates, a clade-specific missense variant in elongation factor G (FusA1 S459F) was identified; this SNP was not observed in other Global or Validation Dataset isolates.The remaining aminoglycoside-AMR ST801 strain, SCHI0010.S.1, encoded AAC (6')-IIa, an aminoglycosidemodifying enzyme.Inclusion of FusA1 S459F into our AMR database significantly increased ARDaP bACCs for the Validation Dataset by an average 19% for both amikacin and tobramycin, raising them to 95% and 90%, respectively, with no impact on Global Dataset bACC.
abritAMR, AMRFinderPlus, and ResFinder all yielded AMR precision and recall values of 0% for colistin; in other words, none of these tools identified colistin AMR variants in strains exhibiting a colistin AMR phenotype.In comparison, ARDaP identified colistin AMR strains with 21% recall and 100% precision.Similarly, abri-tAMR, AMRFinderPlus, and ResFinder all failed to predict cefepime sensitivity in any cefepime-sensitive strain (Fig. 2B-D), instead erroneously classing every strain as cefepime-resistant, whereas ARDaP correctly identified cefepime-sensitive strains with 96% precision and 96% recall (Fig. 2A).All three tools also failed to predict ceftazidime sensitivity.In addition, abritAMR and AMRFin-derPlus failed to identify piperacillin sensitivity in any of the tested strains, and AMRFinderPlus failed to identify piperacillin/tazobactam sensitivity (Fig. 2B and Fig. 2C).

Discussion
The increasing role of high-throughput sequencing in the clinic has driven the concomitant development of bioinformatic tools for AMR variant detection and antimicrobial phenotype prediction [113].However, current gold standard AMR tools are limited in their accuracy and performance due to their heavy focus on AMR gene gain rather than AMR-conferring chromosomal variants and their inability to detect the gamut of genetic mutations that can confer AMR (i.e.indels, CNVs, large deletions, frameshift mutations, and structural variants) [14].In addition, most tools have primarily focused on AMR gene detection rather than AMR phenotype prediction.These shortcomings become acutely evident when attempting to predict AMR in pathogens with complex resistomes like P. aeruginosa [8].
To address this issue, we first constructed a comprehensive and accurate database of AMR variants encoded by P. aeruginosa.Our database of 728 chromosomal variants (Dataset 1) comprises 362 previously identified variants that we confirmed were significantly associated with one or more AMR phenotypes, 75 previously unreported AMR-conferring variants (Table 2), 281 variants that we classed as natural variants due to their non-significant association with AMR strains (Table S3), and 10 loci that conferred unusual antimicrobial susceptibility.Natural variants were included in our database for three reasons: (i) to allow ARDaP's coverage algorithm to scan known AMR genes for novel, high-consequence mutations (e.g.frameshift mutations in oprD that lead to carbapenem AMR) whilst avoiding variants that do not impact function, (ii) to substantially reduce the legwork involved in identifying putative novel AMR variants using mGWAS, machine learning, or comparative genomics, and (iii) to provide a valuable resource for minimising erroneous AMR variant reporting in future P. aeruginosa AMR variant discovery studies.
Next, performance assessment of our ARDaP-compatible P. aeruginosa AMR database against the Global and Validation Datasets showed that our tool outstripped the predictive performance of current 'gold standard' AMR software across all 10 antibiotics, yielding average bACCs of 85% and 81%, respectively.In comparison, average bACCs for the Global and Validation Datasets were just 56% and 54% for abriTAMR, 58% and 54% for AMRFinderPlus, and 60% and 53% for ResFinder, respectively (Fig. 1).This performance difference is due to our chromosomal AMR variant database, coupled with our comprehensive comparative genomics pipeline, which identified all types of chromosomal variation and linked these variants with individual antibiotic phenotypes (Dataset 1).ARDaP also demonstrated superior precision and recall metrics for both antimicrobial-sensitive and AMR strains across all 10 tested antibiotics (Fig. 2).Our findings concur with a recent study of 654 P. aeruginosa genomes, which also found that CARD and ResFinder Fig. 2 Precision and recall of ARDaP, abritAMR, AMRFinderPlus, and ResFinder software across the Global Dataset (n = 1877 strains).Precision and recall metrics for both antimicrobial-sensitive and antimicrobial-resistant (AMR) strains were highest using ARDaP (A; range 73-96%) vs. abritAMR (B; range 54-62%), AMRFinderPlus (C; range 54-62%) and ResFinder (D; range 42-68%).To enable software comparisons, isolates with intermediate resistance were removed prior to analysis.Abbreviations: AMK, amikacin; CAZ, ceftazidime; CIP, ciprofloxacin; CST, colistin; FEP, cefepime; FQs, fluoroquinolones; IPM, imipenem; MEM, meropenem; PIP, piperacillin; TZP, piperacillin/tazobactam; TOB, tobramycin.N.B.Precision (AMR) is also known as positive predictive value, and precision (sensitivity) is also known as negative predictive value exhibited poor AMR prediction performance metrics across all 11 tested anti-pseudomonal antibiotics [8].
Although still inferior to ARDaP, abritAMR, AMRFin-derPlus, and ResFinder performed best when predicting aminoglycoside phenotypes in the Global Dataset (Fig. 1A), which is heavily populated with American and European strains (Table S1).However, these tools performed substantially worse when tested against the Australian Validation Dataset (Fig. 1B), with the average aminoglycoside bACC dropping by 16% for AMRFinder-Plus and 26% for ResFinder.ARDaP's bACC also initially dropped by 18% for the aminoglycosides.Upon closer inspection, we found that this performance reduction was predominantly due to false-negative calls among the ST801 isolates, a geographically restricted clone that has only been reported in people with CF in Qld, Australia [114].Inclusion of one novel fusA1 variant, identified with comparative genomics, restored ARDaP's bACCs to 90% and 95% for amikacin and tobramycin, respectively.This performance difference across isolate datasets can be attributed to two phenomena.The first is the predominance of aminoglycoside-modifying enzymes in the Global (33%) but not Validation (8%) Datasets, reflecting potential major differences in the geographic prevalence of these enzymes that requires further exploration.The second is the enrichment of CF-derived isolates in the Validation Dataset, which comprise 86% of the aminoglycoside AMR strains (Fig S1).These isolates have largely developed aminoglycoside AMR via chromosomal mutation rather than aminoglycoside-modifying enzyme acquisition; as such, abritAMR, AMRFinderPlus, and ResFinder exhibited poor aminoglycoside AMR predictive capacity due to their limited chromosomal AMR variant databases.These performance differences highlight the need for including isolates from diverse sources, disease states, and locales to provide the most relevant AMR prediction software benchmarking comparisons.Our results suggest that abritAMR, AMRFinderPlus, and ResFinder are not useful for predicting aminoglycoside AMR from CF-derived P. aeruginosa, particularly in the Australian context, although this finding requires further exploration across larger, geographically diverse datasets.
Our findings revealed important weaknesses in abri-tAMR, AMRFinderPlus, and ResFinder when used for phenotype prediction.All three tools yielded bACCs of just 50% for cephalosporin prediction, abritAMR and AMRFinderPlus yielded a bACC of just 50% for piperacillin (Fig. 1A and Fig. 1B), and abritAMR performed worse than a coin flip for predicting piperacillin/tazobactam phenotypes, with a bACC of just 35%.This underperformance was largely attributed to sensitive isolates being predicted as AMR (Fig. 2).The inferior performance of abritAMR and AMRFinderPlus over ResFinder was further exacerbated by their software design; for most anti-pseudomonal antibiotics, these tools only predict phenotypes to the antibiotic class level.To facilitate direct software comparisons, AMR identified for a given antibiotic class by abritAMR and AMRFinderPlus was extrapolated to all antibiotics within that class, which likely led to higher imprecision or error due to differences in within-class antibiotic spectrum of activity.For instance, the poor abritAMR piperacillin/tazobactam bACC may be attributed to this tool only reporting 'β-lactamase' presence; however, the impact of this β-lactamase on piperacillin/tazobactam efficacy is not explicitly reported due to insufficient granularity.Based on our and other's [8] collective findings, we strongly discourage the use of abritAMR, AMRFinderPlus, or Res-Finder for in silico cephalosporin AMR prediction in P. aeruginosa as none of these tools are currently capable of accurately differentiating sensitive from AMR strains for these antibiotics.Furthermore, abritAMR and AMRFin-derPlus should not be used to predict penicillin susceptibility phenotypes in P. aeruginosa due to their insufficient resolution.
Colistin prediction proved the most challenging of the 10 tested antibiotics, yielding bACCs of 50% with abri-TAMR, AMRFinderPlus, and ResFinder, and 60% with ARDaP (Fig. 1).ARDaP was the only tool capable of correctly predicting some colistin AMR strains in the Global Dataset (Fig. 2A); the other three tools erroneously classified all P. aeruginosa strains as colistin-sensitive (Fig. 2B-2D).Accurate colistin prediction may have been hampered by the purported unreliability of gradient diffusion methods (e.g.disc diffusions and ETESTs) to accurately measure colistin breakpoints due to poor antibiotic diffusion and Mueller-Hinton agar manufacturer differences [115,116].In any case, our study highlights a major gap in understanding the basis of colistin AMR, and underscores the need for much more work in this area, especially given the increasing use of inhaled colistin in treatment-refractory, multidrug-resistant P. aeruginosa infections [117,118].
Loss-of-function mutations affecting the specialised porin, OprD, are the most common cause of carbapenem AMR in P. aeruginosa, particularly in clinical isolates [9].Although oprD is notoriously hypervariable [119], ARDaP's ability to accurately identify functional OprD loss accounted for its high carbapenem bACCs (average of 88% and 84% in the Global and Validation Datasets, respectively, vs. just 60% and 55% for abritAMR, 59% and 55% for AMRFinderPlus, and 50% and 49% for ResFinder, respectively; Fig. 1).This outcome highlights the complex nature of the P. aeruginosa resistome, the necessity of manual AMR variant curation efforts, the value of AMR prediction tools that can accurately detect the spectrum of chromosomal variants, and the need for species-specific AMR variant databases to achieve the most accurate AMR predictions.
Mutations leading to chromosomal cephalosporinase (ampC) overexpression are an important cause of β-lactam AMR [15,120].To predict ampC upregulation, our database includes function-altering mutations in genes known to directly or indirectly regulate ampC.These chromosomally encoded variants, alongside acquired cephalosporinases, accounted for the higher average ARDaP bACC observed for cephalosporins (84% and 70% in the Global and Validation Datasets, respectively, vs. just 50% with abritAMR, AMRFinderPlus, and ResFinder across both Datasets; Fig. 1).The prominence of ampC overexpression-associated variants provides further support that this mechanism is a major cause of acquired cephalosporin AMR, particularly in clinical P. aeruginosa isolates [120,121].In support of this hypothesis, Khaledi and colleagues demonstrated a considerably higher bACC for ceftazidime AMR prediction when using both transcriptomic and genomic data (82%) compared with just genomic data (67%) [9].Genomics alone cannot currently identify all instances of ampC overexpression, either because up-regulation is the result of an epigenic change or the mutation remains cryptic due to an incomplete understanding of ampC regulatory mechanisms.Indeed, a review of intrinsic β-lactamases by Juan and colleagues details the complexity of ampC expression and its intricate regulation, along with the challenge of corresponding elevated β-lactamase MICs driven by ampC upregulation to clinical AMR breakpoints [122].Using a combination of genomic and transcriptomic data will likely lead to further improvements in AMR prediction for most antibiotics [9,120].
Predicting intermediate resistance is exceedingly difficult from genomic data alone, even with complex machine learning algorithms that combine transcriptomic and genomic data [9].We also encountered difficulties in predicting intermediate resistance, with the inclusion of intermediate strains dropping bACCs by up to 13% (Table S5).Possible explanations include the need to understand the contribution of stepwise variants in conferring decreased antibiotic susceptibility [15,65,123], subtle and rapidly reversible gene expression alterations [9] caused by methylation [124] or dynamic environmental stimuli, and undetected strain mixtures.Further refinement of our ARDaP-compatible database, such as capacity to analyse RNA-seq data, will continue to improve this critical yet understudied area.Nevertheless, the pioneering capacity of ARDaP to predict intermediate resistance, including stepwise mutations that lower the barrier to full AMR development, and to differentiate strain mixtures in metagenomic data has important implications for detecting emerging AMR in P. aeruginosa and informing earlier treatment shifts [14].
Errors introduced during sample collection, metadata curation, specimen processing, or sequencing may be partially responsible for our inability to predict AMR with a 100% bACC for any antibiotic.For example, 11 strains in the Khaledi et al. dataset [9] possessed variants known to confer ceftazidime AMR (e.g.bla VIM- [2,4,45], bla OXA-2 , bla GES-[1, 5]) yet were reported as ceftazidimesensitive, and 22 strains in the Kos et al. dataset were amikacin-sensitive, yet possessed the aminoglycosidemodifying enzyme gene aac(6')-Ib-cr, known to cause amikacin AMR and reduced ciprofloxacin susceptibility [125].Due to the presence of these known AMR variants, all tools identified these strains as AMR, contributing to imperfect bACC (Fig. 1) and poor precision (Fig. 2) for amikacin and ceftazidime.As we did not have access to these strains, it was not possible to retest their AMR phenotypes or to repeat genome sequencing; however, we hypothesise that these strains would generate different results upon retesting.Alternatively, these AMR variants may be present but functionally or transcriptionally inactive, resulting in false-positive predictions for these isolates that must be factored into future AMR prediction estimates.
We recognise several study limitations.First, some false-positive predictions were identified across all antibiotic classes with our ARDaP-compatible database; however, these rates were significantly lower than those reported by other software (Fig. 1).Whilst not ideal, we chose to retain a small number of AMR variants that result in low-frequency false-positive predictions as (i) some strains may have reverted to a sensitive phenotype, despite encoding a known AMR variant, and (ii) we included phenotypic data generated by others, which may harbour inaccuracies.Functional profiling will be essential to fully understand the contribution of each of these variants in conferring AMR.Second, although we aimed for a phylogenetically diverse Validation Dataset (Fig. S1, Fig. S2), only isolates from Queensland, Australia, were included in this dataset, limiting geographic and genetic representation.Despite this shortcoming, the Validation Dataset proved extremely useful for identifying AMR variant database deficits across all three AMR tools, particularly those variants encoding AMR towards the aminoglycosides, piperacillin, cefepime, and meropenem, highlighting clear areas of need for future research efforts.Third, due to cost constraints, we only performed ciprofloxacin and meropenem ETESTs for the Validation Dataset isolates, with disc diffusions used for the remaining eight antibiotics, a less robust methodology that may have led to some minor discrepancies in antimicrobial phenotype assignments.Fourth, we did not test ARDaP's capacity to identify P. aeruginosa AMR variants from simulated or real metagenomic datasets and strain mixtures as we have demonstrated this capacity elsewhere [14,103], nor did we compare software performance against ARESdb [126] due to the proprietary nature of this database.Fifth, we recognise the importance of functional studies to fully validate new AMR variants such as those listed in Table 2, although these experiments are laborious, time-consuming, and rarely straightforward, and as such, were not conducted as part of this study.Somewhat unsurprisingly, most of these newly discovered mutations occur in genes with already well-characterised roles in driving AMR, and have been the subject of previous functional studies (e.g.oprD [22]), making their role in conferring AMR less contentious.Finally, our study would have benefitted from the inclusion of transcriptomic data [9] to identify additional novel variants associated with AMR.Although our understanding of the molecular mechanisms of AMR in P. aeruginosa is improving rapidly in the genomics era, false-negative predictions still occur across all antibiotic classes.Using ARDaP, such false-negative strains can now be rapidly identified and targeted for future functional work to pinpoint novel AMR variants and mechanisms.

Conclusions
Improved AMR diagnostics, more personalised treatment regimens, and better-informed antimicrobial stewardship measures are crucial for tackling the impending AMR crisis.To this end, we developed a comprehensive P. aeruginosa AMR database that, when used in conjunction with our freely available ARDaP software, predicts AMR towards first-and second-line anti-pseudomonal antibiotics from (meta)genomic data with > 80% accuracy.In comparison, other freely available AMR software can only predict AMR in this pathogen with ≤ 60% accuracy.Our tool generates a clinician-friendly report that predicts antimicrobial susceptibility across 10 anti-pseudomonal antibiotics, enabling it to be readily incorporated into genomics workflows to enhance the diagnosis and surveillance of emerging and circulating P. aeruginosa AMR strains.To improve AMR prediction performance, more functional work is needed to capture the full breadth of genetic and transcriptional changes driving AMR development in this superbug.

a
Variant identified by comparative genomics and thus not assessed for statistical significance b Identified in the Validation Dataset only c High-consequence mutations occurring in this gene are automatically identified by ARDaP

Table 2
Novel antimicrobial resistance (AMR) variants identified in Pseudomonas aeruginosa by microbial genome-wide association study (mGWAS) or comparative genomic a analyses